Accelerating weighted random sampling without replacement
نویسنده
چکیده
Random sampling from discrete populations is one of the basic primitives in statistical computing. This article briefly introduces weighted and unweighted sampling with and without replacement. The case of weighted sampling without replacement appears to be most difficult to implement efficiently, which might be one reason why the R implementation performs slowly for large problem sizes. This paper presents four alternative implementations for the case of weighted sampling without replacement, with an analysis of their run time and correctness.
منابع مشابه
Weighted Random Sampling (2005; Efraimidis, Spirakis)
The problem of random sampling without replacement (RS) calls for the selection of m distinct random items out of a population of size n. If all items have the same probability to be selected, the problem is known as uniform RS. Uniform random sampling in one pass is discussed in [1, 5, 10]. Reservoir-type uniform sampling algorithms over data streams are discussed in [11]. A parallel uniform r...
متن کاملWeighted Random Sampling over Data Streams
In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case ([2,4]), discuss sampling with and without replacement and show adaptations of the algorithms for several WRS problems and evolving data streams.
متن کاملWeighted Sampling Without Replacement from Data Streams
Weighted sampling without replacement has proved to be a very important tool in designing new algorithms. Efraimidis and Spirakis (IPL 2006) presented an algorithm for weighted sampling without replacement from data streams. Their algorithm works under the assumption of precise computations over the interval [0, 1]. Cohen and Kaplan (VLDB 2008) used similar methods for their bottom-k sketches. ...
متن کاملA Direct Bootstrap Method for Complex Sampling Designs From a Finite Population
In complex designs, classical bootstrap methods result in a biased variance estimator when the sampling design is not taken into account. Resampled units are usually rescaled or weighted in order to achieve unbiasedness in the linear case. In the present article, we propose novel resampling methods that may be directly applied to variance estimation. These methods consist of selecting subsample...
متن کاملLattice Paths, Sampling without Replacement, and the Kernel Method
In this work we consider weighted lattice paths in the quarter plane N0 × N0. The steps are given by (m, n) → (m − 1, n), (m, n) → (m, n − 1) and are weighted as follows: (m, n)→ (m− 1, n) by m/(m + n) and step (m, n)→ (m, n− 1) by n/(m + n). The considered lattice paths are absorbed at lines y = x/t− s/t with t ∈ N and s ∈ N0. We provide explicit formulæ for the sum of the weights of paths, st...
متن کامل